Dataset statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 712 | 179 |
| Missing cells | 695 | 171 |
| Missing cells (%) | 8.1% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 72.3 KiB | 18.2 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Train Dataset | Test Dataset | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Train Dataset | Test Dataset | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Age has 140 (19.7%) missing values | Age has 37 (20.7%) missing values | Missing |
Cabin has 553 (77.7%) missing values | Cabin has 134 (74.9%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 484 (68.0%) zeros | SibSp has 124 (69.3%) zeros | Zeros |
Parch has 541 (76.0%) zeros | Parch has 137 (76.5%) zeros | Zeros |
Fare has 13 (1.8%) zeros | Fare has 2 (1.1%) zeros | Zeros |
| Alert not present in this dataset | Fare is highly overall correlated with Pclass | High Correlation |
| Alert not present in this dataset | Pclass is highly overall correlated with Fare | High Correlation |
Reproduction
| Train Dataset | Test Dataset | |
|---|---|---|
| Analysis started | 2023-08-08 08:04:41.455563 | 2023-08-08 08:04:50.192450 |
| Analysis finished | 2023-08-08 08:04:50.175814 | 2023-08-08 08:04:58.102956 |
| Duration | 8.72 seconds | 7.91 seconds |
| Software version | ydata-profiling vv4.4.0 | ydata-profiling vv4.4.0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 712 | 179 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 448.23455 | 437.11173 |
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 1 | 6 |
| Maximum | 891 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Quantile statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 1 | 6 |
| 5-th percentile | 45.1 | 49.5 |
| Q1 | 224.75 | 217.5 |
| median | 453.5 | 423 |
| Q3 | 673.5 | 656 |
| 95-th percentile | 845.9 | 846.4 |
| Maximum | 891 | 890 |
| Range | 890 | 884 |
| Interquartile range (IQR) | 448.75 | 438.5 |
Descriptive statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Standard deviation | 256.73142 | 260.34933 |
| Coefficient of variation (CV) | 0.57276134 | 0.59561277 |
| Kurtosis | -1.2053753 | -1.1647939 |
| Mean | 448.23455 | 437.11173 |
| Median Absolute Deviation (MAD) | 224.5 | 212 |
| Skewness | -0.027340115 | 0.10807588 |
| Sum | 319143 | 78243 |
| Variance | 65911.024 | 67781.774 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 332 | 1 | 0.1% |
| 673 | 1 | 0.1% |
| 610 | 1 | 0.1% |
| 280 | 1 | 0.1% |
| 294 | 1 | 0.1% |
| 401 | 1 | 0.1% |
| 123 | 1 | 0.1% |
| 184 | 1 | 0.1% |
| 203 | 1 | 0.1% |
| 439 | 1 | 0.1% |
| Other values (702) | 702 |
| Value | Count | Frequency (%) |
| 710 | 1 | 0.6% |
| 600 | 1 | 0.6% |
| 528 | 1 | 0.6% |
| 877 | 1 | 0.6% |
| 97 | 1 | 0.6% |
| 293 | 1 | 0.6% |
| 324 | 1 | 0.6% |
| 737 | 1 | 0.6% |
| 530 | 1 | 0.6% |
| 219 | 1 | 0.6% |
| Other values (169) | 169 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 11 | 1 | |
| 24 | 1 | |
| 26 | 1 | |
| 31 | 1 | |
| 32 | 1 | |
| 34 | 1 | |
| 40 | 1 | |
| 45 | 1 | |
| 50 | 1 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 11 | 1 | |
| 24 | 1 | |
| 26 | 1 | |
| 31 | 1 | |
| 32 | 1 | |
| 34 | 1 | |
| 40 | 1 | |
| 45 | 1 | |
| 50 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 |
Pclass
Categorical
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.4% | 1.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 712 | 179 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | 1 | 3 |
| 2nd row | 2 | 2 |
| 3rd row | 3 | 3 |
| 4th row | 3 | 2 |
| 5th row | 3 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Length
Common Values (Plot)
Train Dataset
Test Dataset
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 712 |
| Value | Count | Frequency (%) |
| Decimal Number | 179 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 712 |
| Value | Count | Frequency (%) |
| Common | 179 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 712 |
| Value | Count | Frequency (%) |
| ASCII | 179 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 398 | |
| 1 | 163 | |
| 2 | 151 | 21.2% |
| Value | Count | Frequency (%) |
| 3 | 93 | |
| 1 | 53 | |
| 2 | 33 | 18.4% |
Name
['Text', 'Text']
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 712 | 179 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 82 | 61 |
| Median length | 52 | 45 |
| Mean length | 26.768258 | 27.748603 |
| Min length | 12 | 14 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 19059 | 4967 |
| Distinct characters | 60 | 57 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 712 | 179 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | Partner, Mr. Austen | Moubarek, Master. Halim Gonios ("William George") |
| 2nd row | Berriman, Mr. William John | Kvillner, Mr. Johan Henrik Johannesson |
| 3rd row | Tikkanen, Mr. Juho | Alhomaki, Mr. Ilmari Rudolf |
| 4th row | Hansen, Mr. Henrik Juul | Harper, Miss. Annie Jessie "Nina" |
| 5th row | Andersson, Miss. Ebba Iris Alfrida | Nicola-Yarred, Miss. Jamila |
| Value | Count | Frequency (%) |
| mr | 421 | 14.6% |
| miss | 143 | 5.0% |
| mrs | 100 | 3.5% |
| william | 52 | 1.8% |
| john | 36 | 1.3% |
| master | 33 | 1.1% |
| henry | 32 | 1.1% |
| charles | 20 | 0.7% |
| thomas | 20 | 0.7% |
| george | 16 | 0.6% |
| Other values (1260) | 2006 |
| Value | Count | Frequency (%) |
| mr | 100 | 13.4% |
| miss | 39 | 5.2% |
| mrs | 29 | 3.9% |
| william | 12 | 1.6% |
| george | 8 | 1.1% |
| james | 8 | 1.1% |
| john | 8 | 1.1% |
| master | 7 | 0.9% |
| mary | 5 | 0.7% |
| margaret | 5 | 0.7% |
| Other values (435) | 524 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2169 | 11.4% | |
| r | 1551 | 8.1% |
| e | 1328 | 7.0% |
| a | 1324 | 6.9% |
| i | 1065 | 5.6% |
| s | 1041 | 5.5% |
| n | 1024 | 5.4% |
| M | 898 | 4.7% |
| l | 841 | 4.4% |
| o | 802 | 4.2% |
| Other values (50) | 7016 |
| Value | Count | Frequency (%) |
| 566 | 11.4% | |
| r | 407 | 8.2% |
| e | 375 | 7.5% |
| a | 333 | 6.7% |
| n | 280 | 5.6% |
| i | 260 | 5.2% |
| s | 256 | 5.2% |
| M | 230 | 4.6% |
| l | 226 | 4.6% |
| o | 206 | 4.1% |
| Other values (47) | 1828 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 12269 | |
| Uppercase Letter | 2893 | 15.2% |
| Space Separator | 2169 | 11.4% |
| Other Punctuation | 1504 | 7.9% |
| Open Punctuation | 107 | 0.6% |
| Close Punctuation | 107 | 0.6% |
| Dash Punctuation | 10 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 3177 | |
| Uppercase Letter | 752 | 15.1% |
| Space Separator | 566 | 11.4% |
| Other Punctuation | 395 | 8.0% |
| Close Punctuation | 37 | 0.7% |
| Open Punctuation | 37 | 0.7% |
| Dash Punctuation | 3 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 2169 |
| Value | Count | Frequency (%) |
| 566 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1551 | |
| e | 1328 | |
| a | 1324 | |
| i | 1065 | |
| s | 1041 | |
| n | 1024 | |
| l | 841 | 6.9% |
| o | 802 | 6.5% |
| t | 527 | 4.3% |
| h | 412 | 3.4% |
| Other values (16) | 2354 |
| Value | Count | Frequency (%) |
| r | 407 | |
| e | 375 | |
| a | 333 | |
| n | 280 | |
| i | 260 | |
| s | 256 | |
| l | 226 | 7.1% |
| o | 206 | 6.5% |
| t | 140 | 4.4% |
| h | 105 | 3.3% |
| Other values (15) | 589 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 898 | |
| A | 195 | 6.7% |
| J | 169 | 5.8% |
| H | 162 | 5.6% |
| S | 146 | 5.0% |
| C | 131 | 4.5% |
| E | 124 | 4.3% |
| W | 119 | 4.1% |
| B | 113 | 3.9% |
| L | 109 | 3.8% |
| Other values (15) | 727 |
| Value | Count | Frequency (%) |
| M | 230 | |
| A | 55 | 7.3% |
| J | 46 | 6.1% |
| E | 42 | 5.6% |
| C | 41 | 5.5% |
| H | 41 | 5.5% |
| S | 34 | 4.5% |
| G | 28 | 3.7% |
| F | 27 | 3.6% |
| B | 27 | 3.6% |
| Other values (14) | 181 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 713 | |
| , | 712 | |
| " | 70 | 4.7% |
| ' | 8 | 0.5% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| . | 179 | |
| , | 179 | |
| " | 36 | 9.1% |
| ' | 1 | 0.3% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 107 |
| Value | Count | Frequency (%) |
| ( | 37 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 107 |
| Value | Count | Frequency (%) |
| ) | 37 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10 |
| Value | Count | Frequency (%) |
| - | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 15162 | |
| Common | 3897 | 20.4% |
| Value | Count | Frequency (%) |
| Latin | 3929 | |
| Common | 1038 | 20.9% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2169 | ||
| . | 713 | 18.3% |
| , | 712 | 18.3% |
| ( | 107 | 2.7% |
| ) | 107 | 2.7% |
| " | 70 | 1.8% |
| - | 10 | 0.3% |
| ' | 8 | 0.2% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 566 | ||
| . | 179 | 17.2% |
| , | 179 | 17.2% |
| ) | 37 | 3.6% |
| ( | 37 | 3.6% |
| " | 36 | 3.5% |
| - | 3 | 0.3% |
| ' | 1 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 1551 | 10.2% |
| e | 1328 | 8.8% |
| a | 1324 | 8.7% |
| i | 1065 | 7.0% |
| s | 1041 | 6.9% |
| n | 1024 | 6.8% |
| M | 898 | 5.9% |
| l | 841 | 5.5% |
| o | 802 | 5.3% |
| t | 527 | 3.5% |
| Other values (41) | 4761 |
| Value | Count | Frequency (%) |
| r | 407 | 10.4% |
| e | 375 | 9.5% |
| a | 333 | 8.5% |
| n | 280 | 7.1% |
| i | 260 | 6.6% |
| s | 256 | 6.5% |
| M | 230 | 5.9% |
| l | 226 | 5.8% |
| o | 206 | 5.2% |
| t | 140 | 3.6% |
| Other values (39) | 1216 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 19059 |
| Value | Count | Frequency (%) |
| ASCII | 4967 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2169 | 11.4% | |
| r | 1551 | 8.1% |
| e | 1328 | 7.0% |
| a | 1324 | 6.9% |
| i | 1065 | 5.6% |
| s | 1041 | 5.5% |
| n | 1024 | 5.4% |
| M | 898 | 4.7% |
| l | 841 | 4.4% |
| o | 802 | 4.2% |
| Other values (50) | 7016 |
| Value | Count | Frequency (%) |
| 566 | 11.4% | |
| r | 407 | 8.2% |
| e | 375 | 7.5% |
| a | 333 | 6.7% |
| n | 280 | 5.6% |
| i | 260 | 5.2% |
| s | 256 | 5.2% |
| M | 230 | 4.6% |
| l | 226 | 4.6% |
| o | 206 | 4.1% |
| Other values (47) | 1828 |
Sex
Categorical
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.3% | 1.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6882022 | 4.7709497 |
| Min length | 4 | 4 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 3338 | 854 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | male | female |
| 5th row | female | female |
Common Values
| Value | Count | Frequency (%) |
| male | 467 | |
| female | 245 |
| Value | Count | Frequency (%) |
| male | 110 | |
| female | 69 |
Length
Common Values (Plot)
Train Dataset
Test Dataset
| Value | Count | Frequency (%) |
| male | 467 | |
| female | 245 |
| Value | Count | Frequency (%) |
| male | 110 | |
| female | 69 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 957 | |
| m | 712 | |
| a | 712 | |
| l | 712 | |
| f | 245 | 7.3% |
| Value | Count | Frequency (%) |
| e | 248 | |
| m | 179 | |
| a | 179 | |
| l | 179 | |
| f | 69 | 8.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3338 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 854 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 957 | |
| m | 712 | |
| a | 712 | |
| l | 712 | |
| f | 245 | 7.3% |
| Value | Count | Frequency (%) |
| e | 248 | |
| m | 179 | |
| a | 179 | |
| l | 179 | |
| f | 69 | 8.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3338 |
| Value | Count | Frequency (%) |
| Latin | 854 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 957 | |
| m | 712 | |
| a | 712 | |
| l | 712 | |
| f | 245 | 7.3% |
| Value | Count | Frequency (%) |
| e | 248 | |
| m | 179 | |
| a | 179 | |
| l | 179 | |
| f | 69 | 8.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3338 |
| Value | Count | Frequency (%) |
| ASCII | 854 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 957 | |
| m | 712 | |
| a | 712 | |
| l | 712 | |
| f | 245 | 7.3% |
| Value | Count | Frequency (%) |
| e | 248 | |
| m | 179 | |
| a | 179 | |
| l | 179 | |
| f | 69 | 8.1% |
Age
Real number (ℝ)
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 83 | 56 |
| Distinct (%) | 14.5% | 39.4% |
| Missing | 140 | 37 |
| Missing (%) | 19.7% | 20.7% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.498846 | 30.505845 |
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0.42 | 0.83 |
| Maximum | 80 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Quantile statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0.42 | 0.83 |
| 5-th percentile | 3.55 | 9 |
| Q1 | 21 | 20 |
| median | 28 | 29 |
| Q3 | 38 | 38.75 |
| 95-th percentile | 55.225 | 60.85 |
| Maximum | 80 | 71 |
| Range | 79.58 | 70.17 |
| Interquartile range (IQR) | 17 | 18.75 |
Descriptive statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Standard deviation | 14.500059 | 14.656239 |
| Coefficient of variation (CV) | 0.49154665 | 0.48044036 |
| Kurtosis | 0.14923338 | 0.27728386 |
| Mean | 29.498846 | 30.505845 |
| Median Absolute Deviation (MAD) | 8 | 9 |
| Skewness | 0.33100174 | 0.62293002 |
| Sum | 16873.34 | 4331.83 |
| Variance | 210.25171 | 214.80535 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 24 | 26 | 3.7% |
| 22 | 23 | 3.2% |
| 28 | 21 | 2.9% |
| 25 | 21 | 2.9% |
| 18 | 20 | 2.8% |
| 30 | 20 | 2.8% |
| 19 | 19 | 2.7% |
| 21 | 19 | 2.7% |
| 29 | 16 | 2.2% |
| 36 | 15 | 2.1% |
| Other values (73) | 372 | |
| (Missing) | 140 | 19.7% |
| Value | Count | Frequency (%) |
| 36 | 7 | 3.9% |
| 19 | 6 | 3.4% |
| 18 | 6 | 3.4% |
| 23 | 5 | 2.8% |
| 21 | 5 | 2.8% |
| 16 | 5 | 2.8% |
| 30 | 5 | 2.8% |
| 40 | 5 | 2.8% |
| 24 | 4 | 2.2% |
| 35 | 4 | 2.2% |
| Other values (46) | 90 | |
| (Missing) | 37 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.1% |
| 0.67 | 1 | 0.1% |
| 0.75 | 2 | 0.3% |
| 0.83 | 1 | 0.1% |
| 0.92 | 1 | 0.1% |
| 1 | 7 | |
| 2 | 10 | |
| 3 | 6 | |
| 4 | 8 | |
| 5 | 2 | 0.3% |
| Value | Count | Frequency (%) |
| 0.83 | 1 | 0.6% |
| 4 | 2 | |
| 5 | 2 | |
| 6 | 1 | 0.6% |
| 9 | 3 | |
| 10 | 1 | 0.6% |
| 11 | 1 | 0.6% |
| 13 | 1 | 0.6% |
| 14 | 1 | 0.6% |
| 15 | 1 | 0.6% |
| Value | Count | Frequency (%) |
| 0.83 | 1 | 0.1% |
| 4 | 2 | |
| 5 | 2 | |
| 6 | 1 | 0.1% |
| 9 | 3 | |
| 10 | 1 | 0.1% |
| 11 | 1 | 0.1% |
| 13 | 1 | 0.1% |
| 14 | 1 | 0.1% |
| 15 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.6% |
| 0.67 | 1 | 0.6% |
| 0.75 | 2 | 1.1% |
| 0.83 | 1 | 0.6% |
| 0.92 | 1 | 0.6% |
| 1 | 7 | |
| 2 | 10 | |
| 3 | 6 | |
| 4 | 8 | |
| 5 | 2 | 1.1% |
SibSp
Real number (ℝ)
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 7 | 5 |
| Distinct (%) | 1.0% | 2.8% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.55337079 | 0.40223464 |
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 4 |
| Zeros | 484 | 124 |
| Zeros (%) | 68.0% | 69.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Quantile statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 2 |
| Maximum | 8 | 4 |
| Range | 8 | 4 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Standard deviation | 1.1764042 | 0.73070347 |
| Coefficient of variation (CV) | 2.1258877 | 1.81661 |
| Kurtosis | 16.505734 | 7.4108164 |
| Mean | 0.55337079 | 0.40223464 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.6193851 | 2.4416505 |
| Sum | 394 | 72 |
| Variance | 1.3839267 | 0.53392756 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 484 | |
| 1 | 164 | 23.0% |
| 2 | 23 | 3.2% |
| 4 | 16 | 2.2% |
| 3 | 13 | 1.8% |
| 8 | 7 | 1.0% |
| 5 | 5 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 124 | |
| 1 | 45 | 25.1% |
| 2 | 5 | 2.8% |
| 3 | 3 | 1.7% |
| 4 | 2 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 484 | |
| 1 | 164 | 23.0% |
| 2 | 23 | 3.2% |
| 3 | 13 | 1.8% |
| 4 | 16 | 2.2% |
| 5 | 5 | 0.7% |
| 8 | 7 | 1.0% |
| Value | Count | Frequency (%) |
| 0 | 124 | |
| 1 | 45 | 25.1% |
| 2 | 5 | 2.8% |
| 3 | 3 | 1.7% |
| 4 | 2 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 124 | |
| 1 | 45 | 6.3% |
| 2 | 5 | 0.7% |
| 3 | 3 | 0.4% |
| 4 | 2 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 484 | |
| 1 | 164 | 91.6% |
| 2 | 23 | 12.8% |
| 3 | 13 | 7.3% |
| 4 | 16 | 8.9% |
| 5 | 5 | 2.8% |
| 8 | 7 | 3.9% |
Parch
Real number (ℝ)
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 7 | 6 |
| Distinct (%) | 1.0% | 3.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.37921348 | 0.39106145 |
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 5 |
| Zeros | 541 | 137 |
| Zeros (%) | 76.0% | 76.5% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Quantile statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 5 |
| Range | 6 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Standard deviation | 0.79166932 | 0.86318491 |
| Coefficient of variation (CV) | 2.0876613 | 2.2072871 |
| Kurtosis | 9.6634025 | 10.119759 |
| Mean | 0.37921348 | 0.39106145 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.695459 | 2.9125135 |
| Sum | 270 | 70 |
| Variance | 0.62674031 | 0.74508819 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 541 | |
| 1 | 94 | 13.2% |
| 2 | 67 | 9.4% |
| 4 | 3 | 0.4% |
| 3 | 3 | 0.4% |
| 5 | 3 | 0.4% |
| 6 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 137 | |
| 1 | 24 | 13.4% |
| 2 | 13 | 7.3% |
| 3 | 2 | 1.1% |
| 5 | 2 | 1.1% |
| 4 | 1 | 0.6% |
| Value | Count | Frequency (%) |
| 0 | 541 | |
| 1 | 94 | 13.2% |
| 2 | 67 | 9.4% |
| 3 | 3 | 0.4% |
| 4 | 3 | 0.4% |
| 5 | 3 | 0.4% |
| 6 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 137 | |
| 1 | 24 | 13.4% |
| 2 | 13 | 7.3% |
| 3 | 2 | 1.1% |
| 4 | 1 | 0.6% |
| 5 | 2 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 137 | |
| 1 | 24 | 3.4% |
| 2 | 13 | 1.8% |
| 3 | 2 | 0.3% |
| 4 | 1 | 0.1% |
| 5 | 2 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 541 | |
| 1 | 94 | 52.5% |
| 2 | 67 | 37.4% |
| 3 | 3 | 1.7% |
| 4 | 3 | 1.7% |
| 5 | 3 | 1.7% |
| 6 | 1 | 0.6% |
Ticket
['Text', 'Text']
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 558 | 169 |
| Distinct (%) | 78.4% | 94.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7668539 | 6.6871508 |
| Min length | 3 | 3 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 4818 | 1197 |
| Distinct characters | 35 | 28 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 458 | 160 ? |
| Unique (%) | 64.3% | 89.4% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | 113043 | 2661 |
| 2nd row | 28425 | C.A. 18723 |
| 3rd row | STON/O 2. 3101293 | SOTON/O2 3101287 |
| 4th row | 350025 | 248727 |
| 5th row | 347082 | 2651 |
| Value | Count | Frequency (%) |
| pc | 42 | 4.7% |
| c.a | 22 | 2.4% |
| ca | 14 | 1.6% |
| a/5 | 14 | 1.6% |
| 2 | 10 | 1.1% |
| ston/o | 10 | 1.1% |
| sc/paris | 8 | 0.9% |
| soton/oq | 7 | 0.8% |
| 2343 | 7 | 0.8% |
| 347082 | 6 | 0.7% |
| Other values (585) | 763 |
| Value | Count | Frequency (%) |
| pc | 18 | 7.9% |
| c.a | 5 | 2.2% |
| soton/o.q | 4 | 1.8% |
| 347088 | 3 | 1.3% |
| a/5 | 3 | 1.3% |
| w./c | 3 | 1.3% |
| 2661 | 2 | 0.9% |
| ston/o | 2 | 0.9% |
| 17485 | 2 | 0.9% |
| 2 | 2 | 0.9% |
| Other values (176) | 183 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 607 | |
| 1 | 553 | |
| 2 | 493 | |
| 7 | 383 | 7.9% |
| 4 | 365 | 7.6% |
| 0 | 330 | 6.8% |
| 6 | 329 | 6.8% |
| 5 | 317 | 6.6% |
| 9 | 252 | 5.2% |
| 8 | 220 | 4.6% |
| Other values (25) | 969 |
| Value | Count | Frequency (%) |
| 3 | 139 | |
| 1 | 136 | |
| 7 | 107 | |
| 2 | 101 | |
| 4 | 99 | |
| 6 | 93 | 7.8% |
| 9 | 76 | 6.3% |
| 0 | 76 | 6.3% |
| 5 | 70 | 5.8% |
| 8 | 62 | 5.2% |
| Other values (18) | 238 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3849 | |
| Uppercase Letter | 525 | 10.9% |
| Other Punctuation | 236 | 4.9% |
| Space Separator | 191 | 4.0% |
| Lowercase Letter | 17 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 959 | |
| Uppercase Letter | 127 | 10.6% |
| Other Punctuation | 59 | 4.9% |
| Space Separator | 48 | 4.0% |
| Lowercase Letter | 4 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 607 | |
| 1 | 553 | |
| 2 | 493 | |
| 7 | 383 | |
| 4 | 365 | |
| 0 | 330 | |
| 6 | 329 | |
| 5 | 317 | |
| 9 | 252 | |
| 8 | 220 | 5.7% |
| Value | Count | Frequency (%) |
| 3 | 139 | |
| 1 | 136 | |
| 7 | 107 | |
| 2 | 101 | |
| 4 | 99 | |
| 6 | 93 | |
| 9 | 76 | |
| 0 | 76 | |
| 5 | 70 | |
| 8 | 62 |
Space Separator
| Value | Count | Frequency (%) |
| 191 |
| Value | Count | Frequency (%) |
| 48 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 156 | |
| / | 80 |
| Value | Count | Frequency (%) |
| . | 41 | |
| / | 18 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 118 | |
| O | 77 | |
| P | 74 | |
| A | 72 | |
| S | 60 | |
| N | 33 | 6.3% |
| T | 29 | 5.5% |
| W | 13 | 2.5% |
| Q | 11 | 2.1% |
| I | 11 | 2.1% |
| Other values (6) | 27 | 5.1% |
| Value | Count | Frequency (%) |
| C | 33 | |
| P | 24 | |
| O | 23 | |
| S | 14 | |
| A | 10 | 7.9% |
| T | 7 | 5.5% |
| N | 7 | 5.5% |
| Q | 4 | 3.1% |
| W | 3 | 2.4% |
| H | 1 | 0.8% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 5 | |
| s | 4 | |
| i | 3 | |
| r | 3 | |
| l | 1 | 5.9% |
| e | 1 | 5.9% |
| Value | Count | Frequency (%) |
| a | 1 | |
| r | 1 | |
| i | 1 | |
| s | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4276 | |
| Latin | 542 | 11.2% |
| Value | Count | Frequency (%) |
| Common | 1066 | |
| Latin | 131 | 10.9% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 607 | |
| 1 | 553 | |
| 2 | 493 | |
| 7 | 383 | |
| 4 | 365 | |
| 0 | 330 | |
| 6 | 329 | |
| 5 | 317 | |
| 9 | 252 | |
| 8 | 220 | 5.1% |
| Other values (3) | 427 |
| Value | Count | Frequency (%) |
| 3 | 139 | |
| 1 | 136 | |
| 7 | 107 | |
| 2 | 101 | |
| 4 | 99 | |
| 6 | 93 | |
| 9 | 76 | |
| 0 | 76 | |
| 5 | 70 | |
| 8 | 62 | |
| Other values (3) | 107 |
Latin
| Value | Count | Frequency (%) |
| C | 118 | |
| O | 77 | |
| P | 74 | |
| A | 72 | |
| S | 60 | |
| N | 33 | 6.1% |
| T | 29 | 5.4% |
| W | 13 | 2.4% |
| Q | 11 | 2.0% |
| I | 11 | 2.0% |
| Other values (12) | 44 | 8.1% |
| Value | Count | Frequency (%) |
| C | 33 | |
| P | 24 | |
| O | 23 | |
| S | 14 | |
| A | 10 | 7.6% |
| T | 7 | 5.3% |
| N | 7 | 5.3% |
| Q | 4 | 3.1% |
| W | 3 | 2.3% |
| a | 1 | 0.8% |
| Other values (5) | 5 | 3.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4818 |
| Value | Count | Frequency (%) |
| ASCII | 1197 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 607 | |
| 1 | 553 | |
| 2 | 493 | |
| 7 | 383 | 7.9% |
| 4 | 365 | 7.6% |
| 0 | 330 | 6.8% |
| 6 | 329 | 6.8% |
| 5 | 317 | 6.6% |
| 9 | 252 | 5.2% |
| 8 | 220 | 4.6% |
| Other values (25) | 969 |
| Value | Count | Frequency (%) |
| 3 | 139 | |
| 1 | 136 | |
| 7 | 107 | |
| 2 | 101 | |
| 4 | 99 | |
| 6 | 93 | 7.8% |
| 9 | 76 | 6.3% |
| 0 | 76 | 6.3% |
| 5 | 70 | 5.8% |
| 8 | 62 | 5.2% |
| Other values (18) | 238 |
Fare
Real number (ℝ)
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 220 | 107 |
| Distinct (%) | 30.9% | 59.8% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 32.586276 | 30.684473 |
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 262.375 |
| Zeros | 13 | 2 |
| Zeros (%) | 1.8% | 1.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
Quantile statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.215 |
| Q1 | 7.925 | 7.8958 |
| median | 14.4542 | 14.5 |
| Q3 | 30.5 | 32.4104 |
| 95-th percentile | 116.30125 | 95.23833 |
| Maximum | 512.3292 | 262.375 |
| Range | 512.3292 | 262.375 |
| Interquartile range (IQR) | 22.575 | 24.5146 |
Descriptive statistics
| Train Dataset | Test Dataset | |
|---|---|---|
| Standard deviation | 51.969529 | 39.447725 |
| Coefficient of variation (CV) | 1.5948287 | 1.2855924 |
| Kurtosis | 33.679535 | 13.842715 |
| Mean | 32.586276 | 30.684473 |
| Median Absolute Deviation (MAD) | 6.8042 | 7.25 |
| Skewness | 4.8750656 | 3.2942177 |
| Sum | 23201.429 | 5492.5207 |
| Variance | 2700.832 | 1556.123 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 8.05 | 35 | 4.9% |
| 13 | 33 | 4.6% |
| 7.8958 | 32 | 4.5% |
| 7.75 | 26 | 3.7% |
| 26 | 25 | 3.5% |
| 10.5 | 17 | 2.4% |
| 7.925 | 17 | 2.4% |
| 0 | 13 | 1.8% |
| 7.2292 | 13 | 1.8% |
| 8.6625 | 13 | 1.8% |
| Other values (210) | 488 |
| Value | Count | Frequency (%) |
| 13 | 9 | 5.0% |
| 7.75 | 8 | 4.5% |
| 8.05 | 8 | 4.5% |
| 10.5 | 7 | 3.9% |
| 26 | 6 | 3.4% |
| 7.8958 | 6 | 3.4% |
| 7.8542 | 4 | 2.2% |
| 7.05 | 4 | 2.2% |
| 15.2458 | 3 | 1.7% |
| 7.775 | 3 | 1.7% |
| Other values (97) | 121 |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 4.0125 | 1 | 0.1% |
| 5 | 1 | 0.1% |
| 6.2375 | 1 | 0.1% |
| 6.4375 | 1 | 0.1% |
| 6.45 | 1 | 0.1% |
| 6.4958 | 2 | 0.3% |
| 6.75 | 2 | 0.3% |
| 6.8583 | 1 | 0.1% |
| 6.95 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 7.0458 | 1 | 0.6% |
| 7.05 | 4 | |
| 7.125 | 2 | |
| 7.225 | 2 | |
| 7.2292 | 2 | |
| 7.25 | 3 | |
| 7.4958 | 1 | 0.6% |
| 7.55 | 2 | |
| 7.7292 | 1 | 0.6% |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 7.0458 | 1 | 0.1% |
| 7.05 | 4 | |
| 7.125 | 2 | |
| 7.225 | 2 | |
| 7.2292 | 2 | |
| 7.25 | 3 | |
| 7.4958 | 1 | 0.1% |
| 7.55 | 2 | |
| 7.7292 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 4.0125 | 1 | 0.6% |
| 5 | 1 | 0.6% |
| 6.2375 | 1 | 0.6% |
| 6.4375 | 1 | 0.6% |
| 6.45 | 1 | 0.6% |
| 6.4958 | 2 | 1.1% |
| 6.75 | 2 | 1.1% |
| 6.8583 | 1 | 0.6% |
| 6.95 | 1 | 0.6% |
Cabin
['Text', 'Text']
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 117 | 42 |
| Distinct (%) | 73.6% | 93.3% |
| Missing | 553 | 134 |
| Missing (%) | 77.7% | 74.9% |
| Memory size | 11.1 KiB | 2.8 KiB |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.6289308 | 3.4444444 |
| Min length | 1 | 1 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 577 | 155 |
| Distinct characters | 19 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 82 | 39 ? |
| Unique (%) | 51.6% | 86.7% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | C124 | D47 |
| 2nd row | B58 B60 | C123 |
| 3rd row | B38 | D28 |
| 4th row | C52 | D19 |
| 5th row | C93 | C110 |
| Value | Count | Frequency (%) |
| c23 | 4 | 2.1% |
| c27 | 4 | 2.1% |
| c25 | 4 | 2.1% |
| f | 4 | 2.1% |
| c26 | 3 | 1.6% |
| e101 | 3 | 1.6% |
| f2 | 3 | 1.6% |
| c22 | 3 | 1.6% |
| g6 | 3 | 1.6% |
| b98 | 3 | 1.6% |
| Other values (120) | 153 |
| Value | Count | Frequency (%) |
| c126 | 2 | 3.9% |
| e25 | 2 | 3.9% |
| d | 2 | 3.9% |
| e34 | 1 | 2.0% |
| f33 | 1 | 2.0% |
| d19 | 1 | 2.0% |
| c110 | 1 | 2.0% |
| a6 | 1 | 2.0% |
| d48 | 1 | 2.0% |
| b69 | 1 | 2.0% |
| Other values (38) | 38 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 64 | |
| 2 | 61 | |
| B | 50 | 8.7% |
| 3 | 48 | 8.3% |
| 1 | 48 | 8.3% |
| 6 | 36 | 6.2% |
| 5 | 36 | 6.2% |
| 4 | 31 | 5.4% |
| 8 | 29 | 5.0% |
| 28 | 4.9% | |
| Other values (9) | 146 |
| Value | Count | Frequency (%) |
| 6 | 15 | 9.7% |
| D | 15 | 9.7% |
| B | 14 | 9.0% |
| 1 | 13 | 8.4% |
| 2 | 11 | 7.1% |
| 3 | 11 | 7.1% |
| 9 | 10 | 6.5% |
| 5 | 9 | 5.8% |
| E | 8 | 5.2% |
| 7 | 8 | 5.2% |
| Other values (8) | 41 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 362 | |
| Uppercase Letter | 187 | |
| Space Separator | 28 | 4.9% |
| Value | Count | Frequency (%) |
| Decimal Number | 98 | |
| Uppercase Letter | 51 | |
| Space Separator | 6 | 3.9% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 64 | |
| B | 50 | |
| E | 25 | 13.4% |
| D | 19 | 10.2% |
| F | 12 | 6.4% |
| A | 10 | 5.3% |
| G | 6 | 3.2% |
| T | 1 | 0.5% |
| Value | Count | Frequency (%) |
| D | 15 | |
| B | 14 | |
| E | 8 | |
| C | 7 | |
| A | 5 | 9.8% |
| F | 1 | 2.0% |
| G | 1 | 2.0% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 61 | |
| 3 | 48 | |
| 1 | 48 | |
| 6 | 36 | |
| 5 | 36 | |
| 4 | 31 | |
| 8 | 29 | |
| 7 | 26 | |
| 0 | 24 | 6.6% |
| 9 | 23 | 6.4% |
| Value | Count | Frequency (%) |
| 6 | 15 | |
| 1 | 13 | |
| 2 | 11 | |
| 3 | 11 | |
| 9 | 10 | |
| 5 | 9 | |
| 7 | 8 | |
| 8 | 8 | |
| 0 | 7 | |
| 4 | 6 | 6.1% |
Space Separator
| Value | Count | Frequency (%) |
| 28 |
| Value | Count | Frequency (%) |
| 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 390 | |
| Latin | 187 |
| Value | Count | Frequency (%) |
| Common | 104 | |
| Latin | 51 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 64 | |
| B | 50 | |
| E | 25 | 13.4% |
| D | 19 | 10.2% |
| F | 12 | 6.4% |
| A | 10 | 5.3% |
| G | 6 | 3.2% |
| T | 1 | 0.5% |
| Value | Count | Frequency (%) |
| D | 15 | |
| B | 14 | |
| E | 8 | |
| C | 7 | |
| A | 5 | 9.8% |
| F | 1 | 2.0% |
| G | 1 | 2.0% |
Common
| Value | Count | Frequency (%) |
| 2 | 61 | |
| 3 | 48 | |
| 1 | 48 | |
| 6 | 36 | |
| 5 | 36 | |
| 4 | 31 | |
| 8 | 29 | |
| 28 | ||
| 7 | 26 | |
| 0 | 24 | 6.2% |
| Value | Count | Frequency (%) |
| 6 | 15 | |
| 1 | 13 | |
| 2 | 11 | |
| 3 | 11 | |
| 9 | 10 | |
| 5 | 9 | |
| 7 | 8 | |
| 8 | 8 | |
| 0 | 7 | |
| 6 | 5.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 577 |
| Value | Count | Frequency (%) |
| ASCII | 155 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 64 | |
| 2 | 61 | |
| B | 50 | 8.7% |
| 3 | 48 | 8.3% |
| 1 | 48 | 8.3% |
| 6 | 36 | 6.2% |
| 5 | 36 | 6.2% |
| 4 | 31 | 5.4% |
| 8 | 29 | 5.0% |
| 28 | 4.9% | |
| Other values (9) | 146 |
| Value | Count | Frequency (%) |
| 6 | 15 | 9.7% |
| D | 15 | 9.7% |
| B | 14 | 9.0% |
| 1 | 13 | 8.4% |
| 2 | 11 | 7.1% |
| 3 | 11 | 7.1% |
| 9 | 10 | 6.5% |
| 5 | 9 | 5.8% |
| E | 8 | 5.2% |
| 7 | 8 | 5.2% |
| Other values (8) | 41 |
Embarked
Categorical
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.4% | 1.7% |
| Missing | 2 | 0 |
| Missing (%) | 0.3% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 710 | 179 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | S | C |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | S | C |
Common Values
| Value | Count | Frequency (%) |
| S | 525 | |
| C | 125 | 17.6% |
| Q | 60 | 8.4% |
| (Missing) | 2 | 0.3% |
| Value | Count | Frequency (%) |
| S | 119 | |
| C | 43 | 24.0% |
| Q | 17 | 9.5% |
Length
Common Values (Plot)
Train Dataset
Test Dataset
| Value | Count | Frequency (%) |
| s | 525 | |
| c | 125 | 17.6% |
| q | 60 | 8.5% |
| Value | Count | Frequency (%) |
| s | 119 | |
| c | 43 | 24.0% |
| q | 17 | 9.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 525 | |
| C | 125 | 17.6% |
| Q | 60 | 8.5% |
| Value | Count | Frequency (%) |
| S | 119 | |
| C | 43 | 24.0% |
| Q | 17 | 9.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 710 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 179 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 525 | |
| C | 125 | 17.6% |
| Q | 60 | 8.5% |
| Value | Count | Frequency (%) |
| S | 119 | |
| C | 43 | 24.0% |
| Q | 17 | 9.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 710 |
| Value | Count | Frequency (%) |
| Latin | 179 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 525 | |
| C | 125 | 17.6% |
| Q | 60 | 8.5% |
| Value | Count | Frequency (%) |
| S | 119 | |
| C | 43 | 24.0% |
| Q | 17 | 9.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 710 |
| Value | Count | Frequency (%) |
| ASCII | 179 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 525 | |
| C | 125 | 17.6% |
| Q | 60 | 8.5% |
| Value | Count | Frequency (%) |
| S | 119 | |
| C | 43 | 24.0% |
| Q | 17 | 9.5% |
Survived
Categorical
| Train Dataset | Test Dataset | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.3% | 1.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 11.1 KiB | 2.8 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Train Dataset | Test Dataset | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Train Dataset | Test Dataset | |
|---|---|---|
| Total characters | 712 | 179 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Train Dataset | Test Dataset | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Train Dataset | Test Dataset | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 0 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 0 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Length
Common Values (Plot)
Train Dataset
Test Dataset
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 712 |
| Value | Count | Frequency (%) |
| Decimal Number | 179 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 712 |
| Value | Count | Frequency (%) |
| Common | 179 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 712 |
| Value | Count | Frequency (%) |
| ASCII | 179 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 444 | |
| 1 | 268 |
| Value | Count | Frequency (%) |
| 0 | 105 | |
| 1 | 74 |
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
| PassengerId | Age | SibSp | Parch | Fare | Pclass | Sex | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.027 | -0.081 | 0.000 | -0.008 | 0.069 | 0.042 | 0.000 | 0.115 |
| Age | 0.027 | 1.000 | -0.189 | -0.263 | 0.121 | 0.250 | 0.072 | 0.000 | 0.115 |
| SibSp | -0.081 | -0.189 | 1.000 | 0.466 | 0.460 | 0.156 | 0.185 | 0.081 | 0.162 |
| Parch | 0.000 | -0.263 | 0.466 | 1.000 | 0.417 | 0.000 | 0.246 | 0.000 | 0.164 |
| Fare | -0.008 | 0.121 | 0.460 | 0.417 | 1.000 | 0.488 | 0.188 | 0.184 | 0.271 |
| Pclass | 0.069 | 0.250 | 0.156 | 0.000 | 0.488 | 1.000 | 0.122 | 0.224 | 0.321 |
| Sex | 0.042 | 0.072 | 0.185 | 0.246 | 0.188 | 0.122 | 1.000 | 0.076 | 0.538 |
| Embarked | 0.000 | 0.000 | 0.081 | 0.000 | 0.184 | 0.224 | 0.076 | 1.000 | 0.154 |
| Survived | 0.115 | 0.115 | 0.162 | 0.164 | 0.271 | 0.321 | 0.538 | 0.154 | 1.000 |
Test Dataset
| PassengerId | Age | SibSp | Parch | Fare | Pclass | Sex | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.102 | 0.027 | 0.009 | -0.036 | 0.139 | 0.000 | 0.000 | 0.000 |
| Age | 0.102 | 1.000 | -0.152 | -0.214 | 0.193 | 0.243 | 0.146 | 0.000 | 0.171 |
| SibSp | 0.027 | -0.152 | 1.000 | 0.387 | 0.399 | 0.000 | 0.307 | 0.090 | 0.256 |
| Parch | 0.009 | -0.214 | 0.387 | 1.000 | 0.385 | 0.000 | 0.284 | 0.109 | 0.169 |
| Fare | -0.036 | 0.193 | 0.399 | 0.385 | 1.000 | 0.520 | 0.232 | 0.277 | 0.368 |
| Pclass | 0.139 | 0.243 | 0.000 | 0.000 | 0.520 | 1.000 | 0.119 | 0.363 | 0.384 |
| Sex | 0.000 | 0.146 | 0.307 | 0.284 | 0.232 | 0.119 | 1.000 | 0.207 | 0.532 |
| Embarked | 0.000 | 0.000 | 0.090 | 0.109 | 0.277 | 0.363 | 0.207 | 1.000 | 0.177 |
| Survived | 0.000 | 0.171 | 0.256 | 0.169 | 0.368 | 0.384 | 0.532 | 0.177 | 1.000 |
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
Test Dataset
Train Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 331 | 332 | 1 | Partner, Mr. Austen | male | 45.5 | 0 | 0 | 113043 | 28.5000 | C124 | S | 0 |
| 733 | 734 | 2 | Berriman, Mr. William John | male | 23.0 | 0 | 0 | 28425 | 13.0000 | NaN | S | 0 |
| 382 | 383 | 3 | Tikkanen, Mr. Juho | male | 32.0 | 0 | 0 | STON/O 2. 3101293 | 7.9250 | NaN | S | 0 |
| 704 | 705 | 3 | Hansen, Mr. Henrik Juul | male | 26.0 | 1 | 0 | 350025 | 7.8542 | NaN | S | 0 |
| 813 | 814 | 3 | Andersson, Miss. Ebba Iris Alfrida | female | 6.0 | 4 | 2 | 347082 | 31.2750 | NaN | S | 0 |
| 118 | 119 | 1 | Baxter, Mr. Quigg Edmond | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | 0 |
| 536 | 537 | 1 | Butt, Major. Archibald Willingham | male | 45.0 | 0 | 0 | 113050 | 26.5500 | B38 | S | 0 |
| 361 | 362 | 2 | del Carlo, Mr. Sebastiano | male | 29.0 | 1 | 0 | SC/PARIS 2167 | 27.7208 | NaN | C | 0 |
| 29 | 30 | 3 | Todoroff, Mr. Lalio | male | NaN | 0 | 0 | 349216 | 7.8958 | NaN | S | 0 |
| 55 | 56 | 1 | Woolner, Mr. Hugh | male | NaN | 0 | 0 | 19947 | 35.5000 | C52 | S | 1 |
Test Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 709 | 710 | 3 | Moubarek, Master. Halim Gonios ("William George") | male | NaN | 1 | 1 | 2661 | 15.2458 | NaN | C | 1 |
| 439 | 440 | 2 | Kvillner, Mr. Johan Henrik Johannesson | male | 31.0 | 0 | 0 | C.A. 18723 | 10.5000 | NaN | S | 0 |
| 840 | 841 | 3 | Alhomaki, Mr. Ilmari Rudolf | male | 20.0 | 0 | 0 | SOTON/O2 3101287 | 7.9250 | NaN | S | 0 |
| 720 | 721 | 2 | Harper, Miss. Annie Jessie "Nina" | female | 6.0 | 0 | 1 | 248727 | 33.0000 | NaN | S | 1 |
| 39 | 40 | 3 | Nicola-Yarred, Miss. Jamila | female | 14.0 | 1 | 0 | 2651 | 11.2417 | NaN | C | 1 |
| 290 | 291 | 1 | Barber, Miss. Ellen "Nellie" | female | 26.0 | 0 | 0 | 19877 | 78.8500 | NaN | S | 1 |
| 300 | 301 | 3 | Kelly, Miss. Anna Katherine "Annie Kate" | female | NaN | 0 | 0 | 9234 | 7.7500 | NaN | Q | 1 |
| 333 | 334 | 3 | Vander Planke, Mr. Leo Edmondus | male | 16.0 | 2 | 0 | 345764 | 18.0000 | NaN | S | 0 |
| 208 | 209 | 3 | Carr, Miss. Helen "Ellen" | female | 16.0 | 0 | 0 | 367231 | 7.7500 | NaN | Q | 1 |
| 136 | 137 | 1 | Newsom, Miss. Helen Monypeny | female | 19.0 | 0 | 2 | 11752 | 26.2833 | D47 | S | 1 |
Train Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 121 | 122 | 3 | Moore, Mr. Leonard Charles | male | NaN | 0 | 0 | A4. 54510 | 8.0500 | NaN | S | 0 |
| 614 | 615 | 3 | Brocklebank, Mr. William Alfred | male | 35.0 | 0 | 0 | 364512 | 8.0500 | NaN | S | 0 |
| 20 | 21 | 2 | Fynney, Mr. Joseph J | male | 35.0 | 0 | 0 | 239865 | 26.0000 | NaN | S | 0 |
| 700 | 701 | 1 | Astor, Mrs. John Jacob (Madeleine Talmadge Force) | female | 18.0 | 1 | 0 | PC 17757 | 227.5250 | C62 C64 | C | 1 |
| 71 | 72 | 3 | Goodwin, Miss. Lillian Amy | female | 16.0 | 5 | 2 | CA 2144 | 46.9000 | NaN | S | 0 |
| 106 | 107 | 3 | Salkjelsvik, Miss. Anna Kristine | female | 21.0 | 0 | 0 | 343120 | 7.6500 | NaN | S | 1 |
| 270 | 271 | 1 | Cairns, Mr. Alexander | male | NaN | 0 | 0 | 113798 | 31.0000 | NaN | S | 0 |
| 860 | 861 | 3 | Hansen, Mr. Claus Peter | male | 41.0 | 2 | 0 | 350026 | 14.1083 | NaN | S | 0 |
| 435 | 436 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S | 1 |
| 102 | 103 | 1 | White, Mr. Richard Frasar | male | 21.0 | 0 | 1 | 35281 | 77.2875 | D26 | S | 0 |
Test Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 363 | 364 | 3 | Asim, Mr. Adola | male | 35.0 | 0 | 0 | SOTON/O.Q. 3101310 | 7.0500 | NaN | S | 0 |
| 97 | 98 | 1 | Greenfield, Mr. William Bertram | male | 23.0 | 0 | 1 | PC 17759 | 63.3583 | D10 D12 | C | 1 |
| 417 | 418 | 2 | Silven, Miss. Lyyli Karoliina | female | 18.0 | 0 | 2 | 250652 | 13.0000 | NaN | S | 1 |
| 572 | 573 | 1 | Flynn, Mr. John Irwin ("Irving") | male | 36.0 | 0 | 0 | PC 17474 | 26.3875 | E25 | S | 1 |
| 852 | 853 | 3 | Boulos, Miss. Nourelain | female | 9.0 | 1 | 1 | 2678 | 15.2458 | NaN | C | 0 |
| 433 | 434 | 3 | Kallio, Mr. Nikolai Erland | male | 17.0 | 0 | 0 | STON/O 2. 3101274 | 7.1250 | NaN | S | 0 |
| 773 | 774 | 3 | Elias, Mr. Dibo | male | NaN | 0 | 0 | 2674 | 7.2250 | NaN | C | 0 |
| 25 | 26 | 3 | Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) | female | 38.0 | 1 | 5 | 347077 | 31.3875 | NaN | S | 1 |
| 84 | 85 | 2 | Ilett, Miss. Bertha | female | 17.0 | 0 | 0 | SO/C 14885 | 10.5000 | NaN | S | 1 |
| 10 | 11 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.7000 | G6 | S | 1 |
Train Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Test Dataset
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Survived | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||